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(57) Abstract 

The hPMS2 gene encodes a protein which is involved in DNA mismatch repair and is mutated in a subset of patients with bcicditary 
nonpolyposis col n cancer (HNPCC). The previously published hPMS2 cDNA sequence lacks an upstream in-frame stop codon preceding 
the presumptive initiating methionine. To further evaluate the 5* terminus of the hPMS2 coding region, we isolated additional cDNA 
clones, RT-PCR products, and the corresponding 5* genomic segment of the hPMS2 locus. The hPMS2 gene transcripts were found to 
have heterogeneous but collrnear termini, one of which contained an in-frame termination codoo preceding the initiating methionine. In 
addition, a gene encoding a 34.5 IcDa polypeptide was found to transcriptionally initiatB within hPMS2 from the opposite strand. 
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peptides from the S5 kDa protein revealed it to be the produa of hhfLHl, and this 
protein's moieoiiar weight agreed with that predicted from the cDNA seqtience 
(Bronner ^al., 1994; Papadopoulos etal., 1994). The sequence of the peptide 
generated from the 110 IcDa component showed it to be .nciiar to the hPMSZ 
mutL-homoiog; however, the predicted molratlar weight of hPMSl is only 95 kDa 
(Nicolaides, et.ai., 1994). Since the previously isoiated hPMS2 cDNA clones 
ladced an in-£rame terminacion codon upstream of the piesnmpdve iniriflring 
methionine* it was possible that the open reading frame extended fimfaerupstre 
Thus there is a need in the an for further knowledge of the genetic stnsetures of 
and adjacent to the known hPMS2 gene. 

SUMMARY QF THE ][^fVRmQ W 

It is an object of the invention to provide a novel, isolated, human gene on 
cbzomosome 7. 

It is an object of the invendon to provide vectors and host cells for making 
a novel human gene product 

It is anodier object of die invention to provide composiiions of matter 
containing the human gene product. 

These and odxer objects are provided by one or more of the embodiments 
described below. In erne embodiment of die invention, a segment of cDNA is 
provided. The cDNA consists of die sequence of nuckoddes shown in Figure 2. 

According to another embodiment of the invention, a vector comprising the 
segment of cDNA which consists of the sequence of nudeondes shown in Figure 
2 is prt>vided/as well as host cells comprising the vector. 

According to still another embodiment of the invendon, a composidon is 
provided. The composidon consists essentially of a protein consisdng of die amino 
acid sequence shown in Figure 2 

In yet another embodiment of die invention a composidon of protein JTVl 
as shown in Figure I is provided. The composidon is free of other human 
proteins. 
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In another embodiment of the invention a segment of cONA is provided 
which segment encodes the amino acid sequence of JTVl protein shown in 
Figure 2. 

cDNA probes are also provided by the present invention. The cDNA 
portion of said probes consists of between 15 and 1176 contiguous nucleoddes of 
the sequence shown in SEQ ID NO:L 

PMEF PESCMPTIOW OF THE PRA WKNGS 

Figoxe 1 shows the sequence of the 5' region of hPMS2 and predicted 
coding region. The arrow indicates the S* end of the previously published cDNA 
clone* The p r es umptiv e inioating methionine is underlined. 

Figure 2 shows die sequence of JTVl. The sequence has been deposited 
in Genbank, accession number U24169. The presumpdve initiating methionine is 
underlined. 

Figure 3 demonsctaies die genomic localization of JTVl. The genomic 
localization of hPMSZ and JTVl were confirmed by screening somatic-ceU hybrids 
containing vanous r^ions of human chr o mosome 7. Lane 1, GM10791 contains 
entire chromosome 7 in a Chinese hamster ovary (CHO) background; lane 2, 
NA11440 contains 7pter>7p22 in a CHO background; lane 3, Ru-Rag4-13 
contains 7cen-7pter in a muiine background; lane 4, 4AF1/106/K01S contains 
7cea-qter in a mtnine background; lane 5, GMQSI84.I7 contains 7q21.2-qter in 
a CHO background; lane 6, 2068Rag22-2 contains 7q22-qter in a muzine 
background; lane 7, human genomic DNA; lane 8, mouse genomic DNA; lane 9, 
CHO genomic DNA. 

HgOTd 4 demonstrates die mapping of transchpdonai stan sites of hPMS2 
and JTVL Sequence of the genomic region containing the 5' ends of the two 
genes is shown. The sequence is numbered in respect to codon 1 of hPMSl, 
Lower case letters denote intronic sequence of JTVl (from nr. -47Q to -833) and 
hPMSl (from +24 to +108). Arrows indicate the 5' ends of hPhiSl (sense 
strand) and of JTVl (andsense strand) cDNA clones. The underlined ATG codons 
indicate the predicted inioating mediionines for hPMS2 (at nt + i on die sense 
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stiand) and JTVl (at nt -345 on the antisense strand). The sequence has been 
deposited in Genbanic accession number U24168. 

Figure 5 shows the e:q)rcssion of hPMS2 and JTVL RNA from various 
tissues was incubated with reverse transcriptase (RT+) or in control reactions 
without reverse transcriptase <RT-). The cONA was used as ten^ilate for PGR 
with primers specific for hPMS2 (A) and JTVl (B). RT-PCR products were 
separated by poiyacrylamide gel electrophoresis. 

DFTAnim DESCRIPTTOW OF THE PREFERRED EMBQDHlfENTS 

To investigate the upstream region from hPMSl, we isolated additicmal 
cDNA clones, analyzed the 5* end of hPMS2 transcripts with PCR-based 
techniques, and cloned the correqxmding genomic s^meots. In addition to 
clarifying the transczipt, we sexendipitously discovered a previously undesctibed 
gene overlapping /Ei'A£S2. Thatgene is termed herein 7717. The sequences of the 
JTVl cDNA and protein are shown in SEQ ID N0S:1 and 2, req)ectiveiy. 

A segment of cDNA aocording to the p resent invention refers to a 
contiguous stretch of deoxynbonudeotides which have a sequence as obtained upon 
reverse transcriptase of an RNA transcdpL Such segments do not contain intzons. 
The s^ment may be an isdated molecule or it can be covalently joined to other 
nudfdn add sequences. The s^ment may, for example, be repKcated as part of 
a vecuar, such as a plasmid, virus, or minichromosome. The vector may be 
rq)licated within a host cell, such as a cell transformed by a recombinant DNA 
molecule. The host cell may be used to produce JTVl protein. It can also be 
used to study reguladon of expression of JTVl sequences, for example by 
subjecting the host cell to various agents which may or may not affect the 
expression. Although the DNA sequence is discussed with particularity herein, it 
is weii within the gki^i of the an to make small mutations, such as single nucleic 
add substitutions of one of die other three nuddc add bases, at any of the 
positions of the sequence. In addition, it is well within the art to make single base 
deletions or single base insertions, to study the effect upon prmein strucmre and 
function. 
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If JTVl is produced in a recombinant host ceil which is not human, a 
composition of JTVl protein will be produced which is free of other human 
proteins. If JTVl protein is isolated from naturally producing ceilSt or from 
human host cells, then the protein can be purified, for example, using antibodies 
which are raised against an immunogen comprising JTVl amino acid sequence. 
Any other means of purification known the an can be used, as is desired. 

DNA molecules can be made having different nucleotide sequences from 
that disclosed in SEQ ID NO:I, but which still encode the JTVl protein as 
disclosed in SEQ ID N0:2. Using the ioiown coding relationships between codoos 
and amino adds and the disclosed amino add sequence, numerous other sequences 
can be readily designed and produced. Such DNA molecules are within the 
contempladon of the subjea invention. 

cDNA probes can be used for hybridization smdies. Typically they are 
labeled with a detectable marker, such as a radiolabel or a fluorescent moiety, 
although they need not be. The cDNA probes of the subject invention consist of 
at least IS contiguous nucleotides of the sequence shown in SEQ ID NO:I. If 
greater spediidty is desired, larger molecules of 18, 20, 25, or 30 nucleotides can 
be used, up to a maximum of the entire sequence of II76 nucleotides. 

JTVl cDNAs can be used as probes to detea ddeoons in chromosome 7. 
Due to the overlapping promoter regions, la^e ddedons of JTVl would also be 
txjpeojsd to affect PMS2 e x p r e s sion, leading to Hereditary Non-Polyposis 
Colorectal Cancer (HNPCC). JTVl cDNA can be used in chromosome mapping. 
It can also be used to assay activity or competence of the PMS2 promoter region. 
The presence of JTVl transcripts or JTVl protein suggests that the PMS2 promoter 
is intact. If the PMS2 promoter is iniaa and PMS2 products are absent, a 
structural defea in the coding region is indicated. 

JTVl sequences can be used to guide homologous recombmation at the 
PMS2 locus. For example, where a PMS2 mutation is present and therapeutic 
replacement with a wild-type gene is desired. PMS2 sequences can be used to 
provide an adiacent region of homology. Similarly, it may be desirable to target 
other genes to the region adjacent to PMS2. JTVl sequences can be used to flank 
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such other genes, providing one or more regions of homology. If insertion of 
other genes is desired between the JTVl and the PMS2 sequences, again, this can 
be accomplished using the identified sequences as homology units for homologous 
recombinadon. 

Examples 
Emmple 1 

iMbt^TO aqd gegtiqii?? malva? 9t^PV CTd Pf HPMS^t 

Purified DNA &om PI clone 53» previously determined to contain the 
hPMS2 gene (Nicolaides, eLaL^ 1994), was digested witfi EcoRI and subdoned 
into the pBluescript vector (Stratagene). Clones containing the S ' region of hPMS2 
were identified by hybiidizafikm widi primer A (Table 1) directed to exon 1. 
Restriction analysis of several positive clones showed diem to be identicaL The 
sequence of the relevant region of hPMS2 was determined from both strands using 
a-dATP and Sequenase (USB). 
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Table 1. Primen used for hPMS2. 



PRIMER NAME 


STRAND 


PRIMER SEQUENCE 


POSITION* 


A 


sease 


5'- cgggtgttgcatccatgg-3' 


-14 - +4 1 


B 


sense 


5'-gggtggagcacaacgtcg -3* 


-110 - -93 


C 


sense 


5'-ggtcacgacggagaccg-3' 


-283 - -267 


D 


sense 


S *-tgcaggtgggaagctccacacgg-3 * 


-414 - -392 


E 


sense 


5'-tagctcctgccgtgcacg-3* 


-448- -431 


F 


sense 


S'-<:gctcctacctgcacgtg-3* 


-487 - -470 


G 


antisense 


5'-tagactcagtaccacctgc-3' 


+90- +107 


H 


sense 


5* -tacagaacctgctaaggcc-3 * 


+24 - +42 


1 ^ 






+116- +136 


J 


sense 


5'-caaccatgagacacatcgc-3* 


+2545- 


K 


antisense 


S'-aggttagtgaagactctgtc-3* 


+2647- 
+2666 



* Relative to the presumptive initiating methionine in Figure L 



Three clones weic isolated, each containing an 8.5 kbEcoRIinsen. Paitiai 
sequence analysis of one clone, pSMN, determined that it contained coding 
residues of hPMS2 as well as sequences upstream of the previously designated 
codon 1. The presumptive initiating codon reported previously has been 
designated as nucleotide 1 in Figure 1. The sequence of hPMS2 was extended 833 
bp upstream of nucleotide I. This sequence r-^^veaied an in-frame stop codon 321 
nts upstream of the published initiator methionini\ with no intervening methionines 
(Figure 1). 
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Example 2 

Isolation of additypnal cDNA clones using hPAfS2 probes. 

Two cDNA libraries were screened with a probe containing nt +24 to 
+ 136 of hPMS2 generated by PGR using PI clone 53 as template and the primers 
H and I (Table 1). A human small intestine random-primed cDNA library in 
XGTl^ (Clontech) and a HeLa oligo-dT primed cDNA library in XZAPE 
(Stratagene) were screened as described except hybridizations were carried out at 
68*^0 and filters were washed at 65**C for O.S hrs (Kinzler and Vogelstdn, 1989). 
Follcywing plaque purification, the EcoRI inserts from the small intestine library 
were subclcmed into pBluescript vector, while the HeLa cDNA inserts were 
rescued as phagemids following the nianu£eicturer's protocol (Stratagene). 

One clone was isolated from the random-primed small intestine library, and 
this contained nt -14 to nt +1668 of hPMS2. Two clones were isolated firom the 
oligo-dT primed HeLa cDNA library. The clones b^an at nt -53 and ended at 
etdn nts +2722 or +2749. The HeLa cDNA library was also screened with a 
430 bp probe ftom the 5* genomic region of hPMS2, containing nt -414 to + 16, 
generated by PGR bom PI done 53 using primers D (Table 1) and O (Table 2). 
The same two clones were identified, as e3q)ectBd. However, twelve other 
overlapping dones went fbund and appeared to represent a different transcrq^t, 
named JTVl (Figure 2). These twelve cDNAs were approximaidy L2 kb in 
lengdi and were sequenced in their entirety. All twelve ended with a polyA tract 
(assumed to be the 3' end) and were identical for 1.2 kb upstream. The 5' ends 
were located within 38 bp of each other. Comparison with hPMS2 indicated that 
JTVI was transcribed from the opposite strand. 
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Table 2. Prinif»i^ n^d far JTV-I cDNA amplification 



PRIMER NAME 


STRAND 


PRIMER SEQXJENCE 


POSITION* 1 


L 


sense 


5'-gtictgcca:gccgatg-3' 


-8 - +9 j 


M 


sense 


5'-ggcctttggcacgcgctac-3' 


-23 - -41 


N 


sense 


5-accggactgcgttacccg-3 * 


-111 - -129 1 


0 


sense 


S*-tctcagctcgctccatgg-3* 


-343 • .360 j 


p 


andsense 


S'-gcagagacaggttagactc-3 ' 


+139- +157 I 


Q 


sense 


S'.gctccttaagtgaattgccg-3' 


+952- +971 1 


R 


andsense 


5*-tgacacttgacaacQgcc-3* 


+1068 - 
+1086 



* Relative to the presumpdve initiating methionine in Figtm 2. 



mL 

The length of one done representative of J7T7 (pM23NNFL) was 1233 bp 
and encoded an open reading frame (ORF) of 936 bp (Hguie 2). The first 
methionine within dus ORF was designated codon 1 (Hgtue 2) and was preceded 
by an in-firame terminaticm codon 66 bp upstream. This methionine had a 
reasonable match to the Kozak translation initiation consensus (Kozak, 1986). The 
3' end contained a polyadenylation signal (AAUAAA) starting at nucleotide 1086 
followed by a polyA tail. The transcript was predicted to encode a polypeptide of 
312 amino acids, with a molecular weight of 34.5 kda. Searches of nucleotide and 
peptide sequence databases showed that this was a novel gene, with limited 
homology to the glutathione S-transferase gene family. 
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Example 4 

Chromosomal Mapping of JTV1 

The hPMS2 locus was previously mapped to chromosome 7p22 by FISH 
using PI clone 53 (Nicolaides ct.al., 1994). Because multiple hPUSl-rdatad 
genes are located on the long arm of chromosome 7 and have conserved 5' regions 
(personal obser^-sdon, Hori ct^al., 1994), we confirmed the genomic localization 
of JTVl by PGR analysis of rodent-human somatic cell hybrid DNAs containing 
various regions of chromosome 7 (Soberer etal., 1993; Powers ct.al., 1993). 
PGR primers were chosen from the 3* untranslated region of hPMS2 and JTVI and 
shown to amplify genomic DNA, hPMS2 primers J and K yielded a 121 bp 
product and JTVl primers Q and R yielded a 134 bp product, PCR products for 
both genes were formed in those DNAs containing the 7p22 region: lines 
GM1(}791 (containing the entire human chromosome 7), NA11440 (Cloriell 
Institute) (7p22>7ptBr) and Ru-Rag4-I3 (Tcen-Tpter) (figure 3, lanes 1, 2. and 3). 
No products were observed in lines 4AF1/106/K015 (7cen-qter), GM05184.17 
C7q21.2-qter), or 2068Rag22-2 (7q22-qter) (figure 3, lanes 4, 5, and 6), 

AnalYsis of tfie 5' Tennini of hPAfS2 and JTVl. 

The S' termini of hPMS2 transcripts were studied by standard cDNA 
cloning, RACE, and RT-PCR analyses. RNA was purified from tissues and cells 
using a guanidine isotfaiocyanate based method (Chomcrynski and Sacchi, 1987). 
Reverse tzanscxiptase-polymerase chain reaction (RT-PCR) was performed using 
randomly primed cDNA as template as described (Leach, etal., 1993). RT-PCR 
of the 5* end of hPMS2 was performed using a common antisense primer (I) and 
the sense primers (A-F) described in Tabic 1 . RT-PCR mapping of the 5' end of 
JTVI was done using a common antisense primer P and the sense primers L-0 as 
described in Table 2. RACE (rapid amplification of cDNA ends, Frohman, et.al., 
1988) was performed on hPMS2 using sequential antisense primers I and G (Table 
1) following the manufacturer's protocol (Clontech). RACE analysis of JTVl was 
done using the antisense primer P (Table 2). Amplification products were cloned 
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into a T-iailed vector (InVitrogen) and sequenced using SP6 and T7 primers. 
Amplifications were done at QS^'C for 30 sec, 56*C for 1.5 min,, and 70*C for 
1.5 min for 35 cycles. Reaction products were separated by electrophoresis in 6% 
nondenatunng polyacrylamide gels. 

Figure 4 shows the sequence of the genomic region containing the 
transcriptional initiation sites of both hPMS2 and JTVi ,^ numbered as in Figure 1 
with respect to hPMSl. The 5* ends of hPMS2 cDNA clones are marked with 
arrowheads on the top strand. One done began at nt -14, one at nt -24, and two 
at nt -53. RACE products were generated from adult brain, leukocyte, and 
placenta mRNA. Using an antisense primer corresponding to nt +116 to + 136, 
multiple bands with approximately 160 to 191 bps were observed in addition to 
less intense bands of up to 550 bp. The sequence of four cloned RACE products 
demonstrated that, as expected, their 5' ends were located between nt -25 to -55. 
These data suggested that the nugority of hPMS2 transcripts initiafied between nt - 
13 to -55, with a minority extending further upstream. This was confirmed by 
RT-PCR analysis using mRNA from HeLa ceils as template. Robust RT-PCR 
products were amplified with sense primers whose 5' ends were at nt -14, -110, 
-2S3, and -414, (primers A, B, C, and D; Table 1) and an antisense primer 
corresponding to nt +90 to +107 (G). No PGR products were observed using 
sense primers whose 5* ends were at nt -448 or -487 (primers E and F). To 
ensure that primers E and F were not defective, successful amplification of 
genomic DNA was performed using these primers and an antisense primer (O) 
corresponding to nt -2 to +16. 

The 5' termiiu of JTVl showed a heterogeneous pattern like that of hPMS2. 
The 5' ends of the 12 cDNA clones are indicated by arrowheads on the bottom 
strand in figure 4. They were located 73 to 113 nt 73 upstream of codon 1 of 
JTVL which corresponded to nt -271 to -232 of hPMS2, RACE confirmed the 
cDNA results in chai the majoritv oi products generated using an antisense primer 
P corresponding to JTVl nt +157 were 230 to 270 bp. RT-PCR analysis was 
performed with antisense primer P and several sense primers (L-O) listed in Table 
2. PCR products were found with sense primers whose 5' ends were at -8, -23, 
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and -111, (primers L,M, and N) but not with a sense primer O whose 5' end was 
at nt -360 with respect to JTVl, nt +1. The latter primer was not defective, as 
a genomic segment could be successfully amplified with it. 

Transcripts of hPMS2 had heterogeneous but collinear 5* termini, 
containing 11 to 415 nt of presumably untranslated sequence. The transcripts 
contained an in-frame stop codon upstream of the presumptive initiating 
methionines (Hgure 1), making the originally described methionine the most likely 
translation initiator. Because no other upstream coding regions of hPMSZ 
appeared to exist, the size discrepancy between that predicted from the hPMS2 
sequence and the 110 kDa hPMS2 protein identified by li and Modrich is likely 
due to post-transcr^)tional modifications or alternative internal exons. 

Our results revealed that hPMS2 overlaps with a novel gene, JTVl, 
transcribed from the opposite strand (Figure 4). This organization is similar to 
that of HUMDUG, a mxccS^iomolog fotmd on human chromosome 5, and ttie 
dihydrofolate reductase (DHFR) gene (Fujii and Shimadat 1989). Both hPMSl- 
JTVl and HUMDUG-DHFR lie in a head to head arrangement, both genes are 
ubiquitously expressed, and both have multiple 5' termini. It has been 
hypothesized that DHFR and HUMDUG may be r^ulated via a bidirectional 
promoter, because a minor subset of the transcripts from the two genes ov^p. 
The migor transcripts of HUMDUG and DHFR, however, do not overly, as is 
true for hPMS2 and JTVZ. It will be of interest to determine whether other 
mismatch repair genes are arranged in a head to head fashion with a contiguous 
gene and if JTVl is involved in DNA rq)lication or repair. 

Example 6 

Expression of hPMS2 and JTVL 

The expression of hPMS2 and JTVJ was analyzed in a variety of mRNA 
samples [jrepared from human tissues. RT-PCR was performed on cDNA 
templates derived from adult brain, leukocytes, kidney, large intestine, colon, 
salivary gland, lung, testes and prostate using primers J and K for hPMSl anci 
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primers Q and R for JTVl (Tables I and 2). Both genes were cjcpressed in all 
tissues tested (Figure 5). 
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SEQUENCE USTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Vogelstein, Bert 

Kinsler W. , Kenneth 
Nicolaides C, Nicholas 

(li> TITLE OF INVENTION: Human JTVl Gene Overlaps PHS2 Gene 

(ill) NOKBER OF SEQOSNCESt 5 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Banner fi Allegretti, LTD. 

(B) STREET: 1001 G Street, NW 

(C) CITY: Washington DC 

(E) COUNTRY: U.S^A. 

(F) ZIP: 20001 

(V) COMPUTER READABLE FORM) 

(A) MEDIUH TYPE: Floppy disk 

(B) COMPOTER: IBM PC compatible 

<C) OPERATING SYSTEMS PC-OOS/MS-DOS 

(D) SOFTWARE: Patent In Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATAt 

(A) APPLICATION NUMBERS 

(B) FILING DATES 

(C) CLASSIFICATION: 

(viU) ATTORNEY/AOENT INFORMATIONS 

(A) NAME: Kagan A. r Sarah 

(B) REGISTRATION NUMBER: 32,141 

(C) REFERENCE/DOCKET NUMBER: 1107.49697 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPBONEs 202-508-9100 

(B) TELEFAX: 202-508-9299 



(2) INFORMATION FOR SEQ ID NOsl: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 384 base pairs 

(B) TYPE: nucleic acid 
(CV'STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(Vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

( ix ) FEATURE : 

(A) NAME/KEY: CDS 

(B) LOCATIOK: 46., 384 
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(xi) SEQtJEMCE DESCRIPTION: SEQ ID N0:1: 

TTACCTGCTA CATCCGCATC CCACAACCAA ACCAAAACGC CGTAC CCC CTC CCA 54 

Arg Val Pro 
1 

AAG GCC AAC GCT CAC AAA CCC TCA GAG GTC ACG ACC GAG ACC GGC CAC X02 
Lys Ala Asn Ala Gin Lys Pro Ser Glu Val Thr Thr Glu Thr Gly His 
S 10 15 

CTC CCT TCT GAC CCT GCT GC6 GGC GTT CGG GAA AAC CCA GTC CGG TGT 150 
Lau Pro Ser Asp Pro Ala Ala Oly Val Arg Glu Asn Ala Val Arg Cya 
20 25 30 35 

GCT CTG ATT GGC CCA GGC TCT TTG ACQ TCA OCA ACT CGA CCT TTG ACA 196 
Ala Leu lie Gly Pro Gly Ser Leu Thr Ser Arg Ser Arg Pro Leu Thr 
40 45 SO 

GAG CCA ATA GGC GAA AAG GAG AGA CGG GAA 6TA TTT TTG CCG CCC CQC 246 
Glu Pro lie Gly Glu Lye Glu Arg Arg Glu Val Phe Leu Pro Pro Arg 
55 60 65 

COO GAA AGG GTO GAG CAC AAC GTC GAA AGO AGC CAA TOO GAG TTC AGO 294 
Pro Glu Arg val Glu Hie Asa Val Glu Ser Ser Gin Trp Olu Phe Arg 
70 75 80 

AGG CGG AGC GCC TGT GGG AGC OCT GGA GGG AAC TTT CCC AGT CCC OGA 342 
Arg Arg Ser Ala Cys Gly Ser Pro Gly Gly Aen Phe Pro Ser Pro Arg 
85 90 95 

GGC GGA TOO GGT GTT GCA TCC ATG GAG OGA GCT GAG AGC TOG 384 
Gly Gly ser Gly Val Ala Ser Met Glu Arg Ala Glu Ser Ser 
100 105 110 

(2) INFORKATION FOR SSQ ID H0t2t 

(i) SEQUSnCB CBARACTERISTZOSs 

(A) LBg GTHi 113 amino acids 

(B) TTPS: amino acid 
(D) TOPOLOGT} linear 

(ii) HOLECOLS TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SSQ ID N0:2: 

Arg Val Pro LyB Ala Asn Ala Gin Lys Pro Ser Glu Val Thr Thr Glu 
15 10 15 

Thr Gly Hia Leu Pro Ser Asp Pro Ala Ala Gly Val Arg Glu Asn Ala 
20 25 30 

Val Arg Cys Ala Leu Ilr Cly Pro Gly Ser Leu Thr Ser Arg Ser Arg 
35 40 45 

Pro Leu Thr Glu Pro lie G]y Glu Lys Glu Arg Arg Glu Val Phe Leu 
SO 55 60 

Pro Pro Arg Pro Glu Arg Val Glu Hia Asn Val Glu Ser Ser Gin Trp 
65 70 75 80 

Glu Phe Arq Arg Arg Ser Ala cya Cly Ser Pro Gly Gly Asn Phe Pro 
85 90 95 



wo 97/08312 PCT/US96/13598 



-19- 

Ser Pro Arg Gly Gly Ser Gly val Ala Ser Met Glu Arg Ala Glu Ser 
100 105 110 

Ser 

(2} INFORMATION FOR SEQ ID NO: 3: 

(i) SEQtTENCE CBARACTERISTICS: 

(A) LENGTH: 1233 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEONESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cONA 

(iii) irrPOTBBTICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Hooo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: COS 

(B) LOCATION: 114.. 1049 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
CCGAACGCCC GCAGCAGGGT CAGAAGGGA6 GTGGCCGGTC TCOCTCGTGA CCTCTGAOGG 60 
TTTCTGACCG TTCGCCTTT C GCAOCCGCTA CACCCTTTTG CTTTGGTTCT GCC ATG 116 

Met 

1 

CCG ATG TAG CAO GTA AAG COC TAT CAC GGG GGC GGC GCG CCT CTC CGT 164 
Pro Met Tyr Gin Val Lys Pro Tyr His Gly Gly Gly Ala Pro Leu Arg 
5 10 15 

GTG GAG CTT CCC ACC TGC ATG TAC CGG CTC CCC AAC GTG CAC GGC AGO 212 
Val Glu Leu Pro Thr Cys Met Tyr Arg Leu Pro Asn Val His Gly Arg 
20 25 30 

AGC TAC GGC CCA GCG COG GGC GCT GGC CAC GTG CAG GAA GAG TCT AAC 260 
Ser Tyr Gly Pro Ala Pro Gly Ala Gly His Val Gin Glu Glu Ser Asn 
35 40 45 

CTC TCT CTG CAA. GCT CTT GAG TCC CGC CAA GAT GAT ATT TTA AAA CGT 308 
Leu Ser Leu Gin Ala Leu Glu Ser Arg Gin Asp Asp lie Leu Lys Arg 
SO 55 60 65 

CTG TAT GAG TTG AAA GCT GCA GTT GAT GGC CTC TCC AAG ATG ATT CAA 356 
Leu Tyr Glu Leu Lys Ala Ala Val Asp Gly Leu Ser Lys Met lie Gin 
70 75 80 

ACA CCA GAT GCA GAC TTG GAT GTA ACC AAC ATA ATC CAA GCG GAT GAG 404 
Thr Pro Asp Ala Asp Leu Asp Val Thr Asn lie lie Gin Ala Asp Glu 
85 90 95 



CCC ACG ACT TTA ACC ACC AAT GCG CTG GAC TTG AAT TCA CTG CTT GGG 
Pro Thr Thr Leu Thr Thr Asn Ala Leu Asp Leu Asn ser Val Leu Gly 
100 105 110 



452 
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AAC GAT TAG GCG GCG CTG AAA GAG ATC GTG ATC AAC GCA AAC CCG GCC 500 
Lys Asp Tyr Gly Ala Leu Lys Asp lie Val lie Asn Ala Asn Pro Ala 
lis 120 125 

TCC GOT CCC CTC TCC CTG CTT GTG CTG CAC AGO CTG CTC TGT GAG CAC 548 
Ser Pro Pro Leu Ser Leu Leu Val Leu His Arg Leu Leu Cys Glu Hla 
130 135 140 145 

TTC ACC CTC CTC TCC AOO CTC CAC ACC CAC TCC TCG CTC AAC ACC GTG B9€ 
Phe Arg Val Leu Ser Thr Val His Thr His Ser Ser Val Lys Ser Val 
150 155 160 

CCT GAA AAC CTT CTC AAC TCC TTT CCA GAA GAG AAT AAA AAA CAC CCC 644 
Pro Glu Aan Leu Leu Lys Cys Phe Gly Glu Gin Asn Lys Lys cin Pro 
165 170 175 

CCC CAA CAC TAT CAC CTC CCA TTC ACT TTA ATT TGG AAG AAT GTG CCG 692 
Arg Gin Asp Tyr Gin Leu Gly Phe Thr Leu lie Trp Lys Asn Val Pro 
180 185 190 

AAG ACC CAC ATC AAA TTC AGO ATC CAC AGO ATC TCC CCC ATC GAA GGC 740 
Lys Thr Ola Met Lys Phe Ser lie Gin Thr Met Cys Pro lie Glu Gly 
195 200 205 

CAA CCG AAC ATT CCA OGT TTC TTC TTC TCT CTC TTT GGC CAC AAC CAT 788 
Glu Gly Asn He Ala Arg Phe Leu Phe Ser Leu Phe Gly Gin Lys His 
210 215 220 225 

AAT CCT CTC AAC GCA ACC CTT ATA GAT ACC TGG CTA CAT ATT GCG ATT 836 
Asn Ala Val Asn Ala Thr Leu He Asp Ser Trp Val Asp He Ala He 
230 235 240 

TTT CAO TTA AAA GAG 6GA AGC ACT AAA GAA AAA CCC CCT CTT TTC CCC 884 
Phe Gin Leu Lys Glu Gly Ser Ser Lys Glu Lys Ala Ala Val Phe Arg 
245 250 255 

TCC ATO AAC TCT CCT CTT CCG AAC ACC CCT TGG CTC CCT CCC AAT GAA 932 
ser Met Asn Ser Ala Leu Gly Lys Ser Pro Trp Leu Ala Gly Asn Glu 
260 265 270 

CTC ACC CTA GCA CAC GTG CTG CTC TGG TCT CTA CTC GAG CAC ATC GCA 980 
Leu Thr Val Ala Asp Val .Val Leu Trp Ser Val Leu Gin Gin He Gly 
275 280 285 

GCC TCC ACT GTG ACA GTG CCA GCC AAT GTG CAC AGC TGG ATC ACC TCT 1028 
Gly Cys Ser Val Thr Val Pro Ala Asn Val Gin Arg Trp Met Arg Ser 
290 ^ 295 300 305 

TGT GAA AAC CTG CCT CCT TTT TAACACGGCC CTCAAGCTCC TTT^GTGAAT 1079 
Cys Glu Asn Leu Ala Pro Phe 
310 

TCCCGTAACT GATTTTAAAG GGTTTAGATT TTAACAATGG TrJCTCTTTCP TGCCTATTM 1139 

CAGTAAGGGG ACTTGTATTA GAGTCAGAGT CTTTTTATTT AGGCCAGTTG TCAAGTCT^/^ 1199 

ATAAAAGCGC ATCATGTAAT TTAAAAAAAA AAAA i233 



(2) INFORMATION FOR SEQ ID NOt4: 



(i) SEQOENCE CHARACTERISTICS: 

(A) LENGTH: 312 amino &cid8 
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(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 4: 

Met Pro Met Tyr Gin V&l lyi Pro Tyr His Cly Gly Cly Ala Pro Leu 
1 S 10 15 

Arg Val Glu Leu Pro Thr Cys Met Tyr Arg Leu Pro Asn Val Hie Gly 
20 25 30 

Arg ser Tyr Gly Pro Ala Pro Gly Ala Gly His Val Gin Glu Glu Ser 
35 40 45 

Aan Leu Ser Leu Gin Ala Leu Glu Ser Arg Gin Asp Asp lie Leu Lys 

SO 55 60 

Arg Leu Tyr Glu Leu Lys Ala Ala Val Asp Gly Leu Ser Lys Met He 
€5 70 75 80 

Gin Thr Pro Asp Ala Asp Leu Asp Val Thr Asn He He Gin Ala Asp 
85 90 9S 

61a Pro Thr Thr Leu Thr Thr Asn Ala Leu Asp Leu Asn Ser Val Leu 
100 105 110 

Gly Lys Asp Tyr Gly Ala Leu Lys Asp He Val He Asn Ala Asn Pro 
115 120 125 

Ala Ser Pro Pro Leu Ser Leu Leu Val Leu His Arg Leu Leu Cys Glu 
130 135 140 

His Phe Arg Val Leu Ser Thr Val Bis Thr His Sec Ser Val Lys Ser 

145 ISO 155 160 

Val Pro Glu Asn Leu Leu Lys Cys Phe Gly Glu Gin Asn Lys Lys Gin 
165 170 175 

Pro Arg Gin Asp Tyr Gin Leu Gly Phe Thr Leu He Trp Lys Asn Val 
180 las 190 

Pro Lys Thr Gin Met Lys Phe Ser He Gin Thr Met Cys Pro He Glu 
195 200 205 

Gly Glu Gly Aso lie Ala Arg Phe Leu Phe Ser Leu Phe Gly Gin Lye 
210 215 220 

His Asn Ala Val Asn Ala Thr Leu lie Asp Ser Trp Val Asp He Ala 
225 230 235 240 

lie Phe G?.n Leu Lys Glu Gly Snr Ser Lys Glu Lys Ala Al.i Vai Phe 
245 250 255 

Arg Ser Met Asn Ser Ala Leu Gly Lys ser Pro Trp Leu Ala Glv Asn 
260 265 270 

Glu Leu Thr Val Ala Asp Val Vdl Leu Trp Ser Val Leu Gin Gin He 
275 280 285 
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Gly Gly Cys Ser Val Thr Val Pro Ala Asn Val Gin Arg Trp Met Arg 
290 295 300 

Ser Cys Glu Asn Leu Ala Pro Phe 
305 310 

(2) INFORMATION FOR SEQ ID NOsS: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTKt 900 base pairs 

(B) TTPBt nucleic acid 

(C) STRANDEDNESSi double 
(0) TOPOLOGY: linear 

(11) KOLECOLE T7PB: ONA (genomic) 

(ill) RTPOTBETICALt NO 

(iv) ARTX-SENSSt NO 

(vl) ORIGINAL SOURCE: 

(A) ORGANISMS Bono sapiens 

(Ix) FEATOHBt 

(A) BAME/KETs mRNA 

(B) LOCATIONS conplement (1..900) 



(Xl) SEQUENCE DBSCRIPTICafs SEQ ID NOsSs 



ACACCOGGCC 


AATTTCTGTA 


TTTTXA0TAO 


AGAOQAaOTT 


TTACCATGTT 


G6CCAGGCTA 


60 


GTCTOQAACT 


CCTQACCTCA 


GGTQATCCOC 


CCGCCTOOGC 


CTCCCAAAGT 


GCTGGGATTA 


120 


CACCOGTGAC 


CCACGGCGCC 


CGOOCTOQAT 


AAATCTTTTA 


AAAGATAAAA 


GTCTGAGTGA 


180 


GTX3CCSQGCC 


CGCCGGCACA 


GATGCCGGGO 


TOOGGCCOTO 


AACCGQTTGO 


GACGCGCTC6 


240 


CTCCG(#CCTG 


GGGGGACCCG 


GGCCAOCAGC 


COGTCGOCQC 


GOGTCJCOCAC 


TGGGCGGGGG 


300 


GCOCOGCGCT 


CCTACCTOCA 


CGTGOCCAGG 


CCCGOCGCTG 


06CCGTAOCT 


CCTGCOGTGC 


360 


ACGTTGGGGA 


GCCGGTACAT 


OGAGGTGGGA 


AGCTCCACAC 


G0AGA66CGC 


GCCGCCCCC6 


420 


TGATAC(SGCT 


TTACCTGGTA 


CATCGGCATG 


GCAGAACCAA 


AGCAAAAGiMS 


GGTACOGOGT 


480 


GCCAAAGCCC 


AAOGCTCAGA 


AACCGTCAGA 


GGTCACGACG 


GAGACCGGCC 


ACCTCCCTTC 


540 


TGACCCTCCT 


GCGCCCGTTC 


GGGAAAACGC 


AGTCCCGTGT 


GCTCTGATTG 


GCCCAGGCCC 


600 


TTTCACGTCA 


OGAAGTCGAC 


CTTTGACAGA 


GCCAATAGGC 


GAAAAGGAGA 


GACGGGAAGT 


660 


ATTTTTCCCC 


CCCCCCCCGG 


AAAGGGTCGA 


GCACAACGTC 


GAAAGCAGCC 


AATGGGAGTT 


720 


CAGGAGGCGC 


AGCGCCTGTG 


GGAGCCCTCG 


AGGGAACTTT 


CCCAGTCCCC 


CAGGCGGATC 


780 


CCCTGTTCCA 


TCCATGGACC 


CAGCTGACAG 


CTCGAGGTGA 


GCGGGGCTCO 


CAOTCTTCCG 


840 


CTGTCCCCTC 


TCCOCCGCCC 


TCTTTGACAC 


CCAOCGCATT 


CCAACCTCCC 


TGGAAATGGG 


900 
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CLAIMS 

L A segment of cDNA consisong of the nucleotide sequence shown 
in Figure 2. 

2. A vector comprising the segment of DNA of claim 1. 

3. A host ceil which comprises the vector of claim 2. 

4. A composition consisting essentially of a protein consisting of the 
amino add sequence shown In Figure 2. 

5. A composition of protein JTVl as shown in Figure 1, wherein said 
conq)osition is free of other human proteins. 

6. A segment of cDNA which encodes the amino add sequence of 
JTVl protein shown in Figure 2. 

1. A cDNA probe wherein said cDNAomsists of between IS and 1176 
contiguous nucleotides of the sequence shown in SEQ ID NO:l. 
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